Visual Diagnostics for Algorithmic Cartridge Case Comparisons

Joseph Zemmels, Susan VanderPlas, Heike Hofmann

Acknowledgements

Funding statement

This work was partially funded by the Center for Statistics and Applications in Forensic Evidence (CSAFE) through Cooperative Agreement 70NANB20H019 between NIST and Iowa State University, which includes activities carried out at Carnegie Mellon University, Duke University, University of California Irvine, University of Virginia, West Virginia University, University of Pennsylvania, Swarthmore College and University of Nebraska, Lincoln.

Background

Cartridge Case Comparisons

  • Determine whether two cartridge cases were fired from the same firearm.

  • Cartridge Case: metal casing containing primer, powder, and projectile

  • Breech Face: back wall of gun barrel

  • Breech Face Impressions: markings left on cartridge case surface by the breech face during the firing process

Current Practice

Impression Comparison Algorithms

National Research Council (2009):

“[T]he decision of a toolmark examiner remains a subjective decision based on unarticulated standards and no statistical foundation for estimation of error rates”

President’s Council of Advisors on Science and Technology (2016):

“A second - and more important - direction is (as with latent print analysis) to convert firearms analysis from a subjective method to an objective method. This would involve developing and testing image-analysis algorithms for comparing the similarity of tool marks on bullets [and cartridge cases].”

We discuss an image-analysis algorithm to compare 3D topographical images of cartridge cases

  • Visual diagnostics aid in understanding what the algorithm does “under the hood.”

Cartridge Case Comparison Algorithms

Ames I Study

  • Baldwin et al. (2014) collected cartridge cases from 25 Ruger SR9 pistols
  • Separated cartridge cases into quartets: 3 known-match + 1 unknown source

  • Match if fired from the same firearm, Non-match if fired from different firearms

  • 216 examiners tasked with determining whether the unknown cartridge case originated from the same pistol as the known-match cartridge cases

  • Baldwin et al. (2014) interested in the “false positive” and “false negative” error rates of these examiners

    • False Positive: Classifying a non-match as a match

    • False Negative: Classifying a match as a non-match

  • 0.80% overall error rate (26 out of 3,268)

  • 1.01% false positive rate (22 out of 2,178 comparisons)

  • 0.37% false negative rate (4 out of 1,090 comparisons)

Cartridge Case Data

  • 3D topographic images from Cadre TopMatch scanner (Weller et al. 2015)

  • x3p file contains surface measurements at the micrometer (“micron”) level

Cartridge Case Comparison Algorithms

Obtain an objective measure of similarity between two cartridge cases

  • Step 1: Independently pre-process scans to isolate breech face impressions
  • Step 2: Compare two cartridge cases to extract a set of numerical features that distinguish between matches vs. non-matches
  • Step 3: Combine numerical features into a single similarity score (e.g., predicted probability of a match)

Examiner takes similarity score into account during an examination

Challenging to know how/when these steps work correctly

Step 1: Pre-process

Isolate region in scan that consistently contains breech face impressions

{r,out.width="100%",include=TRUE,eval=TRUE} # knitr::include_graphics("figures/preProcess_x3pImage.png") #

How do we know when a scan is adequately pre-processed?

Step 2: Compare Full Scans

  • Registration: Determine rotation and translation to align two scans
  • Cross-correlation function (CCF) measures similarity between scans

    • Choose the rotation/translation that maximizes the CCF

Step 2: Compare Cells

  • Split one scan into a grid of cells that are each registered to the other scan

  • For a matching pair, we assume that cells will agree on the same rotation & translation

Why does the algorithm “choose” a particular registration?

Step 3: Score

  • Our approach: predicted probability of a match using a statistical model

What factors influence the final similarity score?

Visual Diagnostics

Visual Diagnostics for Algorithms

  • A number of questions arise out of using comparison algorithms

    • How do we know when a scan is adequately pre-processed?

    • Why does the algorithm “choose” a particular registration?

    • What factors influence the final similarity score?

  • We wanted to create tools that are useful for answering these questions

    • Well-constructed visuals are intuitive and powerful

X3P Plot

  • Map quantiles of surface values to a divergent color scheme

  • Emphasizes extreme values in scan that may need to be removed during pre-processing

  • Allows for comparison of multiple scans on the same color scheme

X3P Plot Pre-processing Example

Useful for diagnosing when scans need additional pre-processing

Comparison Plot

  • Separate aligned scans into similarities and differences

  • Useful for understanding a registration

  • Similarities: Element-wise average between two scans after filtering elements that are less than 1 micron apart

  • Differences: Elements of both scans that are at least 1 micron apart

Full Scan Comparison Plot

Cell Comparison Plot

::: {.fragment fade-out fragment-index=1}

:::

Translating Visuals to Statistics

  • Quantify what our intuition says should be true for (non-)matching scans
  • For a matching cartridge case pair…

    1. There should be more similarities than differences

    2. The different regions should be relatively small

    3. The surface values of the different regions should follow similar trends

Similarities vs. Differences Ratio

  1. There should be more similarities than differences

Ratio between number of similar vs. different observations

Different Region Size

  1. The different regions should be relatively small

Size of the different regions

Different Region Correlation

  1. The surface values of the different regions should follow similar trends

Correlation between the different regions of the two scans

Statistics vs. Similarity Scores

  • Useful for predicting the score returned by a comparison algorithm

[Figure showing feature value vs. class probability]

Automatic Cartridge Evidence Scoring (ACES) Algorithm

Automatic Cartridge Evidence Scoring

  • Comparison algorithm that pre-processes, compares, and scores two cartridge case scans
  • Computes 19 numerical features for each cartridge case pair
  • Predicts match probability for an unknown cartridge case pair using trained statistical model

Visual Diagnostic Features

  • Use visual diagnostic statistics discussed earlier as numerical features
  • Features:

    • From the full scan comparison:

      • Similarities vs. differences ratio

      • Average and standard deviation of different region sizes

      • Different region correlation

    • From cell-based comparison:

      • Average and standard deviation of similarities vs. differences ratios

      • Average and standard deviation of different region sizes

      • Average different region correlation

[Feature densities here]

Registration-based Features

  • For a matching cartridge case pair…

    • Correlation should be large at the full scan and cell levels

    • Cells should “agree” on a particular registration

  • Compute summary statistics of full-scan and cell-based registration results

  • Features:

    • Correlation from full scan comparison

    • Mean and standard deviation of correlations from cell comparisons

    • Standard deviation of cell-based registration values (horizontal/vertical translations & rotation)

[Feature density plot here]

Density-based Features

  • For a matching cartridge case pair…

    • Cells should “agree” on a particular registration

    • The estimated registrations between the two comparison directions should be opposites

  • Features:

    • Average cluster size

    • DBSCAN cluster indicator

    • Absolute sum of density-estimated rotations

    • Root sum of squares of the cluster-estimated translations

ACES Statistical Model

  • Compute 19 features for each pairwise comparison

  • Use 510 cartridge cases collected by Baldwin et al. (2014) to fit a logistic regression classifier model

  • Train logistic regression model using 21,945 pairwise comparisons from 210 scans

    • Classify each pair as a “match” or “non-match” based on estimated match probability

    • Select model that minimizes overall error rate while balancing false positive & false negative rates

  • Test model on 44,850 pairwise comparisons from 300 scans

    • Compute false positive and false negative rates

    • Consider distributions of match probabilities for truly matching and non-matching pairs

Cartridge Case Classification Results

Source False Positive (%) False Negative (%) Overall Error (%)
Baldwin et al. (2014) 1.01 0.37 0.80
ACES, Balanced FP/FN 1.82 1.82 1.82
  • ACES False Positive: 359 out of 19,746 non-match comparisons

  • ACES False Negative: 40 out of 2,199 match comparisons

Notes:

  • We compare every pairwise comparison (1 to 1), Baldwin et al. (2014) compared quartets (3 to 1)

  • We consider classification accuracy as a means of selecting/comparing models. In practice, the examiner would use the estimated match probability as part of their examination.

Match Probability Distributions

[Plot of match probabilities for test data]

Conclusions

Conclusions & Future Work

  • Automatic comparison algorithms are useful for obtaining numerical measures of similarity for two pieces of evidence

  • Visual diagnostics help explain what happens “under the hood” of comparison algorithms

  • Our visual diagnostic tools aid in understanding each step of a cartridge case comparison algorithm

    • Also useful by themselves to visually compare cartridge case evidence
  • The Automatic Cartridge Evidence Scoring (ACES) algorithm shows promise at measuring the similarity between cartridge cases

  • Need additional “stress tests” (different ammunition/firearms, degradation levels, etc.)

  • Explore other optimization criteria than balancing FP and FN error rates

Software

References

AFTE Criteria for Identification Committee. 1992. “Theory of Identification, Range Striae Comparison Reports and Modified Glossary Definitions.” AFTE Journal 24 (3): 336–40.
Baldwin, David P, Stanley J Bajic, Max Morris, and Daniel Zamzow. 2014. A Study of False-Positive and False-Negative Error Rates in Cartridge Case Comparisons.” Fort Belvoir, VA: Ames Lab IA, Performing; Defense Technical Information Center. https://doi.org/10.21236/ADA611807.
Ester, Martin, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu. 1996. “A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise.” In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, 226–31. KDD’96. Portland, Oregon: AAAI Press.
National Research Council. 2009. Strengthening Forensic Science in the United States: A Path Forward. Washington, D.C.: The National Academies Press.
President’s Council of Advisors on Science and Technology. 2016. “Forensic Science in Criminal Courts: Ensuring Scientific Validity of Feature-Comparison Methods.” Executive Office of The President’s Council of Advisors on Science; Technology, Washington DC.
Song, John. 2013. “Proposed NIST Ballistics Identification System (NBIS)’ Based on 3d Topography Measurements on Correlation Cells.” American Firearm and Tool Mark Examiners Journal 45 (2): 11. https://tsapps.nist.gov/publication/get_pdf.cfm?pub_id=910868.
Tai, Xiao Hui, and William F. Eddy. 2018. “A Fully Automatic Method for Comparing Cartridge Case Images,” Journal of Forensic Sciences 63 (2): 440–48. http://doi.wiley.com/10.1111/1556-4029.13577.
Thompson, Robert. 2017. Firearm Identification in the Forensic Science Laboratory. National District Attorneys Association. https://doi.org/10.13140/RG.2.2.16250.59846.
Vorburger, T V, J H Yen, B Bachrach, T B Renegar, J J Filliben, L Ma, H G Rhee, et al. 2007. “Surface Topography Analysis for a Feasibility Assessment of a National Ballistics Imaging Database.” NIST IR 7362. Gaithersburg, MD: National Institute of Standards; Technology. https://doi.org/10.6028/NIST.IR.7362.
Weller, Todd, Marcus Brubaker, Pierre Duez, and Ryan Lilien. 2015. “Introduction and Initial Evaluation of a Novel Three-Dimensional Imaging and Analysis System for Firearm Forensics.” AFTE Journal 47 (January): 198.
Zhang, Hao, Jialing Zhu, Rongjing Hong, Hua Wang, Fuzhong Sun, and Anup Malik. 2021. “Convergence-Improved Congruent Matching Cells (CMC) Method for Firing Pin Impression Comparison.” Journal of Forensic Sciences 66 (2): 571–82. https://doi.org/10.1111/1556-4029.14634.